14 html escaping and unescaping is slow and incomplete#17
Conversation
Updated gleam.toml and manifest.toml to include houdini and odysseus as new dependencies. Bumped package version to 1.2.3 to reflect these changes.
Replaces manual HTML escaping and unescaping logic with calls to the houdini and odysseus libraries, simplifying the code and improving maintainability.
Introduces unit tests for str.escape_html and str.unescape_html, covering basic escaping, unescaping, roundtrip conversions, and numeric entity handling.
Documented changes for version 1.2.3, including replacement of HTML escape/unescape implementations, new dependencies, and added tests. Contributions and suggestions are also acknowledged.
Changed the README logo image source to use a raw.githubusercontent URL for better Hexdocs compatibility. Added a note about escape_html and unescape_html library changes in version 1.2.3. Updated CHANGELOG to document these changes and credit the suggestion.
Introduces a GitHub Actions workflow to run tests and formatting checks on push and pull request events for master and main branches. The workflow sets up the BEAM environment, installs dependencies, runs tests, and checks code formatting.
Introduces a CONTRIBUTING.md file outlining steps for contributing, commit conventions, PR checklist, setup instructions, and testing recommendations for the project.
Introduces comprehensive tests for str.escape_html and str.unescape_html, covering basic entities, numeric and named entities, malformed and unknown entities, combined entities, Unicode/emoji roundtrips, and idempotence/double escaping.
Moved the gleam/list import to the top and reformatted a long assertion in malformed_and_unknown_entity_test for improved readability. (gleam format)
Deleted the 'links' field from gleam.toml as it is no longer needed. This cleans up the configuration file.
Introduces a deterministic fuzz test to verify that str.escape_html and str.unescape_html roundtrip correctly and that the escaped output does not contain raw angle brackets or quotes.
There was a problem hiding this comment.
Pull request overview
This pull request replaces custom HTML escaping/unescaping implementations with specialized libraries (houdini and odysseus) to improve performance and provide comprehensive HTML entity support, while adding extensive tests and CI infrastructure.
- Replaces custom
escape_htmlandunescape_htmlimplementations with library calls for better performance and comprehensive entity support - Adds comprehensive test coverage including basic, extended, and fuzz tests for HTML entity handling
- Adds CI workflow, contributing guidelines, and updates documentation
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/str/core.gleam | Replaces custom HTML escape/unescape implementations with houdini.escape and odysseus.unescape library calls |
| gleam.toml | Adds houdini and odysseus dependencies; bumps version to 1.2.3 |
| manifest.toml | Adds package manifests for new dependencies |
| test/str_html_escape_test.gleam | Adds basic unit tests for HTML escape/unescape with roundtrip and numeric entity tests |
| test/str_html_escape_extended_test.gleam | Adds extended tests for edge cases including malformed entities, Unicode, and idempotence |
| test/str_html_escape_fuzz_test.gleam | Adds deterministic fuzz testing for HTML escape/unescape roundtrips |
| README.md | Updates logo URL to raw GitHub URL and documents the library changes |
| CHANGELOG.md | Documents the 1.2.3 release changes |
| CONTRIBUTING.md | Adds contributing guidelines with setup, commit conventions, and PR checklist |
| .github/workflows/ci.yml | Adds GitHub Actions workflow for automated testing and formatting checks |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert string.contains(escaped, "\"") == False | ||
| assert string.contains(escaped, "'") == False | ||
|
|
||
| True |
There was a problem hiding this comment.
The function returns True but test functions in gleeunit should return Nil. Additionally, the run_cfg calls on lines 65-67 have return values that are ignored. Consider removing the return value from both run_cfg and this test function.
| list.fold(cases, True, fn(_, s) { | ||
| let escaped = str.escape_html(s) | ||
| let unescaped = str.unescape_html(escaped) | ||
| assert unescaped == s | ||
| True | ||
| }) | ||
| } | ||
|
|
There was a problem hiding this comment.
The first parameter in the fold function is unused (indicated by the underscore). The function returns True but this return value is never used. The entire fold seems unnecessary here - consider using list.each instead, or restructure to avoid the unused parameter and return value.
| list.fold(cases, True, fn(_, s) { | |
| let escaped = str.escape_html(s) | |
| let unescaped = str.unescape_html(escaped) | |
| assert unescaped == s | |
| True | |
| }) | |
| } | |
| list.each(cases, fn(s) { | |
| let escaped = str.escape_html(s) | |
| let unescaped = str.unescape_html(escaped) | |
| assert unescaped == s | |
| }) | |
| } |
| - [ ] Tests added/updated | ||
| - [ ] `gleam format` & `gleam test` pass | ||
| - [ ] Update `CHANGELOG.md` if behaviour changes | ||
| - [ ] Document noteworthy changes in `README.md` , docs/ or examples/ |
There was a problem hiding this comment.
There is an extra space before the comma in "README.md , docs/". It should be "README.md, docs/" without the space before the comma.
| - [ ] Document noteworthy changes in `README.md` , docs/ or examples/ | |
| - [ ] Document noteworthy changes in `README.md`, docs/ or examples/ |
| python3 scripts/generate_character_tables.py | ||
| ``` | ||
|
|
||
| Note: as of **1.2.3**, `escape_html` now uses the `houdini` library for fast, allocation‑friendly escaping, and `unescape_html` uses `odysseus` for comprehensive entity support (named, decimal and hex numeric entities). See [CHANGELOG.md](CHANGELOG.md) for details. |
There was a problem hiding this comment.
The hyphen in "allocation‑friendly" is a non-breaking hyphen (U+2011) instead of a regular ASCII hyphen. This is inconsistent with the same term in CHANGELOG.md line 8 which uses a regular hyphen. Consider using a regular hyphen for consistency.
| Note: as of **1.2.3**, `escape_html` now uses the `houdini` library for fast, allocation‑friendly escaping, and `unescape_html` uses `odysseus` for comprehensive entity support (named, decimal and hex numeric entities). See [CHANGELOG.md](CHANGELOG.md) for details. | |
| Note: as of **1.2.3**, `escape_html` now uses the `houdini` library for fast, allocation-friendly escaping, and `unescape_html` uses `odysseus` for comprehensive entity support (named, decimal and hex numeric entities). See [CHANGELOG.md](CHANGELOG.md) for details. |
| run_cfg(42, 50, tokens) | ||
| run_cfg(123, 200, tokens) | ||
|
|
||
| True |
There was a problem hiding this comment.
The test function returns True but this value is never used. In Gleam tests using gleeunit, test functions should return Nil. The function should not return True at the end.
This pull request introduces significant improvements to HTML escaping and unescaping in the
strlibrary, replacing custom implementations with thehoudiniandodysseuslibraries for better performance and comprehensive entity support. It also adds new dependencies, updates documentation, and includes extensive new tests for HTML entity handling.HTML escaping/unescaping improvements:
escape_htmlimplementation insrc/str/core.gleamwithhoudini.escapefor faster, allocation-friendly HTML escaping.unescape_htmlimplementation insrc/str/core.gleamwithodysseus.unescape, supporting named entities as well as numeric decimal and hex entities.houdiniandodysseusas dependencies ingleam.toml.Testing enhancements:
test/str_html_escape_test.gleam,test/str_html_escape_extended_test.gleam, andtest/str_html_escape_fuzz_test.gleam. [1] [2] [3]Documentation and workflow updates:
CONTRIBUTING.mdwith guidelines for contributing, testing, and PRs.README.mdto use a raw GitHub URL for Hexdocs compatibility.